New Generalized-Ensemble Monte Carlo Method 
for Systems with Rough Energy Landscape 



Ulrich H.E. Hansmann Q and Yuko Okamoto 



Department of Theoretical Studies 
Institute for Molecular Science 
Okazaki, Aichi 444^ Japan 



ABSTRACT 

We present a novel Monte Carlo algorithm which enhances equilibrization of low- 
temperature simulations and allows sampling of configurations over a large range of ener- 
gies. The method is based on a non-Boltzmann probability weight factor and is another 
version of the so-called generalized-ensemble techniques. The effectiveness of the new 
approach is demonstrated for the system of a small peptide, an example of the frustrated 
system with a rugged energy landscape. 
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The energy landscape of many important physical systems is characterized by a huge 
number of local minima separated by high energy barriers. In the canonical ensemble 
with temperature T, the probability to cross an energy barrier of heights AE is propor- 
tional to q-^e/^bT ^ where ks is the Boltzmann constant. Hence, at low temperatures, 
canonical molecular dynamics and Monte Carlo simulations will get trapped in configu- 
rations corresponding to one of these local minima. Only small parts of the entire phase 
space can be explored, rendering the calculation of physical quantities unreliable. 

In principle, one can think of two ways to overcome this difficulty. One way is to 
look for improved updates of configurations in the numerical simulation. The cluster 
algorithm is an example of global updates that enhance thermalization and has been 
very successful in spin systems. However, for most other systems with frustration, no 
such updates are known. Another way to overcome the supercritical slowing down is 
to perform a simulation in a so-called generalized ensemble, which is based on a non- 
Boltzmann probability distribution. Multicanonical algorithm ^, l//c-sampling [Q, 
and simulated tempering f^, ^ are prominent examples of such an approach. Common to 
the three techniques is that a molecular dynamics or Monte Carlo simulation is performed 
in an artificial ensemble defined in such a way that a uniform (non-canonical) distribution 
of the chosen physical quantity is obtained. For instance, in the multicanonical algorithm 
the weight Wmu{E) is chosen so that the distribution of energy is uniform: 

P{E) oc n{E) WmuiE) = const, (1) 

where n{E) is the density of states. A simulation based on this weight factor results 
in a free random walk in the energy space. Hence, the simulation can escape from any 
energy barrier, and even regions with small n{E) can be explored in detail. Similarly, 1/k- 
sampling yields a uniform distribution in (microcanonical) entropy, and simulated tem- 
pering a uniform distribution in temperature. The great advantage of these generalized- 
ensemble methods lies in the fact that from a single simulation run one can not only 
locate the energy global minimum but also obtain the canonical distribution for a wide 
temperature range by the reweighting techniques [0]. 

Despite their successful applications to systems with first-order phase transitions 0, 
spin glasses , and the protein folding problem P, |lOl , generalized-ensemble methods are 



not without problems. Unlike in the canonical ensemble, the probability weights are not 
a priori known. For instance, for the case of multicanonical algorithm, Eq. (|l]) implies 

Wmu{E) ^ n'\E) , (2) 

and the knowledge of the exact weight would be equivalent to obtaining the density of 
states n{E), i.e., solving the system. Hence, one needs its estimator for a numerical 
simulation. The determination of the weight Wmu{E) is usually based on an iterative 
procedure first described in Ref. 0, and can be non-trivial and tedious. In this Letter, 
we present a new generalized-ensemble algorithm in which the determination of the weight 
is simple and straightforward. 

Our aim is to develope a new generalized-ensemble algorithm in which the determi- 
nation of the probability weight factor is simpler. For this, we try to slightly modify the 
Boltzmann weight, whereas other generalized-ensemble approaches use drastically differ- 
ent weights. The weight should enhance the thermalization of low-temperature simula- 
tions and ensure sufficient sampling in the low-energy region. Hence, we are interested in 
an ensemble where not only the low-energy region can be sampled efficiently but also the 
high-energy states can be visited with finite probability. The latter feature ensures that 
energy barriers can be overcome and that the simulation can escape from local minima. 
The probability distribution of energy should resemble that of an ideal low-temperature 
Boltzmann distribution, but with a tail to higher energies. One choice is that the sam- 
pling of low-energy states is described by an exponential function (Boltzmann weight), 
while that of high-energy states follows a power law. Guided by these considerations, we 
propose the following as the new weight: 

w{E)=(l + P^^^y^ , (3) 



m 

where (3 = Eqs is the global-minimum energy, and m (> 0) is a free parameter. 
Here, we are shifting the zero of energy by Eqs in order to assure that energy is always 
non-negative. We remark that weights with the same mathematical structure also appear 
in the framework of Tsallis generalized statistical mechanics [0, which was developed 
for simulations of non-extensive systems (e.g., fractal random walks). An application to 
optimization problems can be found in Ref. ||12 |. 



Obviously, the new weight in Eq. reduces to the canonical Boltzmann weight in 
the low-energy (and hence low-temperature) region for <^ i. On the other hand, 

this weight at high energies is no longer exponentially suppressed, but only according to 
a power law with the exponent m. Note that our choice of sign in Eq. (^) is important. 
lYiom. a mathematical point of view, (1 — I3 ^~^^^ )^ is equally a good approximation 
to the canonical weight, but is not useful as a weight in numerical simulations, since the 
expression inside the parentheses can become negative. 

In this work we consider a system with continuous degrees of freedom. At low tem- 
peratures the harmonic approximation holds, and the density of states is given by 



n{E) ^ {E - Egs)"^ , (4) 

where np is the number of degrees of freedom of the system under consideration. Hence, 
by Eqs. @ and (§) the probability distribution of energy for the present ensemble is given 
by 

n p 

P{E) oc n{E)w{E) oc {E - Egs)~~'^ , (5) 
for P^-^^ ^ 1- This implies that we need 

m> — . (6) 

For, otherwise, the sampling of high-energy configurations will be enhanced too much. On 
the other hand, in the limit m — > oo our weight tends for all energies to the Boltzmann 
weight and high-energy configurations will not be sampled. 

In order for low-temperature simulations to be able to escape from energy local minima, 
the weight should start deviating from the (exponentially damped) Boltzmann weight 
at the energy near its mean value (because at low temperatures there are only small 
fluctuations of energy around its mean). In Eq. (H) we may thus set 

^ <E>r -Eos ^1 
m 2 

The mean value at low temperatures is given by the harmonic approximation: 

<E>T =Egs + "y^bT = Egs + ^ . (8) 



Substituting this value into Eq. (|^), we obtain the following optimal value for the exponent 
m: 

rriopt = np • (9) 
Hence, the optimal weight factor is given by 

w{E)= l + f3 , (10) 



np 

where Eq is the best estimate of the global- minimum energy Eos- 

We have tested our new method in the system for the protein folding problem, a 
long-standing problem in biophysics with rough energy landscape. Here, Met-enkephalin 
has become an often-used model to examine the performance of new algorithms, and we 
study the same system. Met-enkephalin has the amino-acid sequence Tyr-Gly-Gly-Phe- 
Met. The energy function Efot (in kcal/mol) that we used is given by the sum of the 
electrostatic term Ec, 12-6 Lennard- Jones term Elj, and hydrogen-bond term Ehb for 
all pairs of atoms in the peptide together with the torsion term Etor for all torsion angles: 

Etot = Ec + Elj + Ehb + Etor , (11) 
Ec = (12) 

Eu = Ef^-^). (13) 

Ehb = (14) 

Etor = E^Kl±cos(n,x/)). (15) 

Here, rij is the distance (in A) between the atoms i and j, and xi is the torsion angle for 
the chemical bond /. The parameters for the energy function and the molecular geometry 
(with fixed bond lengths and bond angles) were adopted from ECEPP/2 (Empirical Con- 
formational Energy Program for Peptides) [?]. The dielectric constant e was set equal to 
2. Fixing the peptide bond angles uo to 180° leaves us with 19 torsion angles as indepen- 



dent degrees of freedom (i.e., np = 19). The computer code KONF90 |jT3| was used. One 
Monte Carlo sweep updates every torsion angle of the peptide once. 

It is known from our previous work that the global-minimum value of KONF90 energy 
for Met-enkephalin is Eqs = —12.2 kcal/mol fl^. The peptide has essentially a unique 



three-dimensional structure at temperatures T < 50 K, and the average energy is about 
—11 kcal/mol at T = 50 K 0. Hence, in the present work we always set T = 50 K (or, 
/? = 10.1 [ ]^cai/moi ]) probability weight factor. All simulations were started from 

completely random initial configurations (Hot Start). 

To demonstrate that thermalization is greatly enhanced in our ensemble, we first 
compare the "time series" of energy as a function of Monte Carlo sweep. In Fig. 1 we 
show the results from a regular canonical Monte Carlo simulation at temperature T = 50 
K (dotted curve) and those from a gereralized-ensemble simulation of the new algorithm 
(solid curve). Here, the weight we used for the latter simulation is given by Eq. ( |10|) 
with Hp = 19 and Eq = Eqs = —12.2 kcal/mol. For the canonical run the curve stays 
around the value E = —6 kcal/mol with small thermal fluctuations, reflecting the low- 
temperature nature. The run has apparently been trapped in a local minimum, since the 
mean energy at this temperature is < E >= —11.1 kcal/mol as found by a multicanonical 
simulation in Ref . . On the other hand, the simulation based on the new weight covers 
a much wider energy range than the canonical run. It is a random walk in energy space, 
which keeps the simulation from getting trapped in a local minimum. It indeed visits the 
ground-state region several times in 200,000 Monte Carlo sweeps. These properties are 
common features of generalized-ensemble methods. 

Since the simulation by the present algorithm samples a large range of energies, we 
can use the reweighting techniques to construct canonical distributions and calculate 
thermodynamic quantities over a wide temperature range. Following 10,000 sweeps for 
thermalization, we performed a single simulation of 1,000,000 Monte Carlo sweeps, storing 
the configuration information at every second sweep. We have set again Eq = —12.2 
kcal/mol and rip = 19 in the weight of Eq. (|TU|). /.From this production run one can 
calculate various thermodynamic quantities as a function of temperature. As examples 
we show the average energy and the specific heat in Fig. 2a and Fig. 2b, respectively. The 
specific heat here is defined by the following equation: 

^ = Tb It - ^ N ' ^^^^ 

where (= 5) is the number of amino-acid residues in the peptide. The harmonic 
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approximation holds at low temperatures, and by substituting Eq. (Q) into Eq. (|T6D, we 
have 

C=^ = 1.9. (17) 

Note that the curve in Fig. 2b approaches this value in the T ^ limit. The results from 
a multicanonical production run with the same statistics are also shown in the Figures 
for comparison. The results from both methods are in complete agreement. 

We now examine the dependence of the simulations on the values of the exponent m 
in our weight (see Eqs. (|^) and ([T0|) ) and demonstrate that m = is indeed the optimal 
choice. Setting Eq = Egs = —12.2 kcal/mol, we performed 10 independent simulation 
runs of 50,000 Monte Carlo sweeps with various choices of m. In Table I we list the lowest 
energies obtained during each of the 10 runs for five choices of m values: 9.5 (= ^), 14, 
19 (= Up), 50, and 100. The results from regular canonical simulations at T = 50 K with 
50,000 Monte Carlo sweeps are also listed in the Table for comparison. If m is chosen to 
be too small (e.g., m = 9.5), then the weight follows a power law in which the suppression 
for higher energy region is insufficient (see Eq. (^). As a result, the simulations tend to 
stay at high energies and fail to sample low-energy configurations. On the other hand, 
for too large a value of m (e.g., m = 100), the weight is too close to the canonical weight, 
and therefore the simulations will get trapped in local minima. It is clear from the Table 
that m = np is the optimal choice. In this case the simulations found the ground-state 
configurations 80 % of the time (8 runs out of 10 runs). This should be compared with 90 
%, 75 %, 80 %, and 40 % for multicanonical annealing, l//c-annealing, simulated tempering 
annealing, and simulated annealing algorithms, respectively, in simulations with the same 



number of Monte Carlo sweeps ||15 



To analyze the above results further, we calculated the actual probability distributions 
of energy for various values of m. This can be done by the reweighting techniques from 
the single production run of 1,000,000 Monte Carlo sweeps mentioned above (which is 
based on the weight of Eq. ([T0| ) with Eq = —12.2 kcal/mol and m = np = 19). The 
results are shown in Fig. 3a. By examining the Figure, we again find that m = np is the 
optimal choice. It yields to an energy distribution which has a pronounced peak around 
the mean energy value {< E > = —11.1 kcal/mol) at T = 50 K. At the same time. 



it has a tail to higher energies. This behavior is exactly what we were looking for and 



justifies our definition of weights in Eq. (10). 

The greatest advantage of the new method over other generalized-ensemble approaches 
is the simplicity of the weight factor. In multicanonical algorithms, l/fc-sampling, or 
simulated tempering, the explicit functional forms of the weights are not known a priori 
and they have to be determined numerically by iterations of trial simulations. This 
can be a formidable task in many cases. On the other hand, the weight factor of the 
present algorithm just depends on the knowledge of the global-minimum energy Eqs (see 
Eq. (0)). If its value is known, which is the case for some systems with frustration, the 
weight is completely determined. However, if Eqs is not known, we have to obtain its best 
estimate Eq. We can calculate the actual probability distributions of energy for various 
values of Eq by the reweighting techniques again. The results are shown in Fig. 3b. We see 
that for the system of Met-enkephalin, one needs the accuracy of about 1 ~ 2 kcal/mol 
in the estimate of the global- minimum energy Eqs in order for our new algorithm to 
be effective. This implication is supported by Table II where we list the lowest energies 
obtained during each of 10 independent simulation runs of 200,000 Monte Carlo sweeps 
with m = Up = 19. Four choices were considered for the Eq value: —12.2, — 13.2, — 14.2, 
and —15.2 kcal/mol. We remark that Eq has to underestimate Eqs to ensure that E — Eq 
can not become negative. Our data show again that an accuracy of 1 ~ 2 kcal/mol in the 
estimate of the global-minimum energy is required for Met-enkephalin. 

The use of our method therefore depends on the ability to find a good estimate for 
the ground-state energy Egs-, which is still much easier than the determination of the 
weights for other generalized-ensemble algorithms. In principle, such estimates can be 
found in an iterative way. Here, we give one of the effective iteration procedures. One 
first sets an initial guess of the optimal Eq which should be lower than Egs- One performs 
a simulation with the weight of the present method with small number of Monte Carlo 
sweeps. From this simulation one calculates the average energy < E >t at the chosen 
temperature T by the reweighting techniques. If < E > — Eq ^ ^ksT, one raises 
the value of Eq by a certain amount and repeats the short simulation. One iterates this 
process until < E > ~ Eq ^ksT. The search of the optimal Eq can be further 
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facilitated by information such as the average energy and the specific heat obtained from 
high temperature simulations. For Met-enkephalin the incorporation of such information 
gave a start value of Eq = —13.8 kcal/mol, which is already within the 2 kcal/mol accuracy 
required by our method (see Ref. [1^ for details). 

In summary, we have introduced a new generalized-ensemble algorithm for simulations 
of systems with frustration. We have demonstrated the effectiveness of the method by 
taking the example of the system of a small peptide, Met-enkephalin, which has a rough 
energy landscape with a huge number of local minima. The advantage of the new method 
lies in the fact that the determination of the probability weight factor is much simpler 
than in other generalized-ensemble approaches. 
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Table Captions: 



1. Lowest energy (in kcal/mol) obtained by the present method with several different 
choices of the free parameter m. The other free parameter Eq was fixed at the value 
of the global-minimum energy Eqs — —12.2 kcal/mol. The temperature was set to 
T = 50 K. The case for m = oo stands for a regular canonical run at T = 50 K. 
For all cases, the total number of Monte Carlo sweeps per run was 50,000. < E > 
is the average of the lowest energy obtained by the 10 runs (with the standard 
deviations in parentheses), and nos is the number of runs in which a conformation 
with E < —11.0 kcal/mol (the average energy at T = 50 K) was obtained. 

2. Lowest energy (in kcal/mol) obtained by the present method with several different 
choices of the free parameter Eq. The other free parameter m was fixed at the 
optimal value of np = 19, the number of degrees of freedom. The temperature was 
set to T = 50 K. For all cases, the total number of Monte Carlo sweeps per run was 
200,000. < £^ > is the average of the lowest energy obtained by the 10 runs (with 
the standard deviations in parentheses), and Hgs is the number of runs in which 
a conformation with E < —11.0 kcal/mol (the average energy at T = 50 K) was 
obtained. 
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Table I. 
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Table II. 
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gure Captions: 



1. Time scries of the total energy Etot (kcal/mol) from a regular canonical simulation 
at temperature T = 50 K (dotted curve) and that from a simulation of the present 
method with the parameters: Eg — —12.2 kcal/mol, m — np — 19, and T = 50 K 
(solid curve). 

2. Average energy (a) and specific heat (b) as a function of temperature. They were 
calculated by the reweighting techniques from a single simulation run of the present 
method with the parameters: Eg = —12.2 kcal/mol, m = np = 19, and T = 50 
K. The results from a multicanonical simulation are also shown for comparison. In 
both simulations (by the present method and by the multicanonical algorithm) the 
total number of Monte Carlo sweeps was 1,000,000. 

3. Distributions of energy for various values of the exponent m (a) and the global- 
minimum energy estimate Eq (b) in the present method. The ordinate for (a) 
is logarithmic. The results were obtained by the reweighting techniques from a 
single simulation run with the parameters: Eq — —12.2 kcal/mol, m = = 19, 
and T 50 K. The total number of Monte Carlo sweeps was 1,000,000. For (a) 
the regular canonical distribtion at T = 50 K as calculated by the reweighting 
techniques is also shown for comparison. 
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